27 research outputs found

    Batch Policy Learning under Constraints

    Get PDF
    When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints. We thus study the problem of batch policy learning under multiple constraints, and offer a systematic solution. We first propose a flexible meta-algorithm that admits any batch reinforcement learning and online learning procedure as subroutines. We then present a specific algorithmic instantiation and provide performance guarantees for the main objective and all constraints. To certify constraint satisfaction, we propose a new and simple method for off-policy policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves strong empirical results in different domains, including in a challenging problem of simulated car driving subject to multiple constraints such as lane keeping and smooth driving. We also show experimentally that our OPE method outperforms other popular OPE techniques on a standalone basis, especially in a high-dimensional setting

    Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

    Get PDF
    Off-policy policy evaluation (OPE) is the problem of estimating the online performance of a policy using only pre-collected historical data generated by another policy. Given the increasing interest in deploying learning-based methods for safety-critical applications, many recent OPE methods have recently been proposed. Due to disparate experimental conditions from recent literature, the relative performance of current OPE methods is not well understood. In this work, we present the first comprehensive empirical analysis of a broad suite of OPE methods. Based on thousands of experiments and detailed empirical analyses, we offer a summarized set of guidelines for effectively using OPE in practice, and suggest directions for future research

    One-loop corrections to the metastable vacuum decay

    Full text link
    We evaluate the one-loop prefactor in the false vacuum decay rate in a theory of a self interacting scalar field in 3+1 dimensions. We use a numerical method, established some time ago, which is based on a well-known theorem on functional determinants. The proper handling of zero modes and of renormalization is discussed. The numerical results in particular show that quantum corrections become smaller away from the thin-wall case. In the thin-wall limit the numerical results are found to join into those obtained by a gradient expansion.Comment: 31 pages, 7 figure

    Biomechanical considerations in the pathogenesis of osteoarthritis of the knee

    Get PDF
    Osteoarthritis is the most common joint disease and a major cause of disability. The knee is the large joint most affected. While chronological age is the single most important risk factor of osteoarthritis, the pathogenesis of knee osteoarthritis in the young patient is predominantly related to an unfavorable biomechanical environment at the joint. This results in mechanical demand that exceeds the ability of a joint to repair and maintain itself, predisposing the articular cartilage to premature degeneration. This review examines the available basic science, preclinical and clinical evidence regarding several such unfavorable biomechanical conditions about the knee: malalignment, loss of meniscal tissue, cartilage defects and joint instability or laxity

    Measurements of ψ(2S) and X(3872) → J/ψπ+π− production in pp collisions at √s=8 TeV with the ATLAS detector

    Get PDF
    Differential cross sections are presented for the prompt and non-prompt production of the hidden-charm states X(3872) and ψ(2S), in the decay mode J/ψπ+π−, measured using 11.4 fb−1 of pp collisions at √s=8 TeV by the ATLAS detector at the LHC. The ratio of cross-sections X(3872)/ψ(2S) is also given, separately for prompt and non-prompt components, as well as the non-prompt fractions of X(3872) and ψ(2S). Assuming independent single effective lifetimes for non-prompt X(3872) and ψ(2S) production gives RB=B(B→X(3872)+any)B(X(3872)→J/ψπ+π−)B(B→ψ(2S)+any)B(ψ(2S)→J/ψπ+π−)=(3.95±0.32(stat)±0.08(sys))×10−2RB=B(B→X(3872)+any)B(X(3872)→J/ψπ+π−)B(B→ψ(2S)+any)B(ψ(2S)→J/ψπ+π−)=(3.95±0.32(stat)±0.08(sys))×10−2 separating short- and long-lived contributions, assuming that the short-lived component is due to Bc decays, gives RB = (3.57 ± 0.33(stat) ± 0.11(sys)) × 10−2, with the fraction of non-prompt X(3872) produced via Bc decays for pT(X(3872)) > 10 GeV being (25 ± 13(stat) ± 2(sys) ± 5(spin))%. The distributions of the dipion invariant mass in the X(3872) and ψ(2S) decays are also measured and compared to theoretical predictions
    corecore